Skip to content

Merge RedistributeCPU and RedistributeGPU into one implementation#5268

Merged
atmyers merged 49 commits intoAMReX-Codes:developmentfrom
atmyers:redist_gpu_tiling
Apr 22, 2026
Merged

Merge RedistributeCPU and RedistributeGPU into one implementation#5268
atmyers merged 49 commits intoAMReX-Codes:developmentfrom
atmyers:redist_gpu_tiling

Conversation

@atmyers
Copy link
Copy Markdown
Member

@atmyers atmyers commented Apr 3, 2026

This merges RedistributeCPU and RedistributeGPU into one shared implementation that works for both, improving the maintainability of the code base. It also restructures the way OpenMP parallelism works in Redistribute on the CPU, resulting in better OpenMP performance and scaling. Another consequence is that particle tiling is now supported on GPU (although probably not desirable in most cases).

Performance results on a redistribute benchmark with 2 MPI ranks on a Perlmutter CPU node as a function of the number of OpenMP threads. "run" is this branch, "dev" is development. The new version is always an improvement and, at high thread counts, is >~ 2x faster.

When compiled for CPU with USE_OMP=FALSE, the new implementation is about 25% faster on that same benchmark, mostly owing to a new early exit in the partition step. The difference is more dramatic is cases with more particles per cell, like this example from WarpX.

redist_opt_plot

Todo:

  • Add support for tiling to RedistributeGPU
  • Rename RedistributeGPU to Redistribute_impl and make both CPU and GPU code paths go through it
  • Add support for OpenMP to Redistribute_impl.
  • Remove RedistributeCPU
  • Performance tests
  • Clean up / finalize
  • Run full application regtest suite.

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

@atmyers atmyers changed the title [WIP] Add support for tiling in RedistributeGPU [WIP] Merge RedistributeCPU and RedistributeGPU into one implementation Apr 8, 2026
@atmyers atmyers marked this pull request as draft April 8, 2026 18:25
@amrex-gitlab-ci-reporter
Copy link
Copy Markdown

GitLab CI 1529965 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1529965.

@atmyers
Copy link
Copy Markdown
Member Author

atmyers commented Apr 17, 2026

/run-hpsf-gitlab-ci

@github-actions
Copy link
Copy Markdown

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1532660.

@amrex-gitlab-ci-reporter
Copy link
Copy Markdown

GitLab CI 1532660 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1532660.

@atmyers
Copy link
Copy Markdown
Member Author

atmyers commented Apr 20, 2026

Note: the regression test for AMReX apps on gaira, garuda, and biollante look good, so long as these PRs are merged along with this one:
AMReX-Astro/Nyx#109
erf-model/ERF#3123

@atmyers
Copy link
Copy Markdown
Member Author

atmyers commented Apr 20, 2026

I also confirmed that the GPU performance has not regressed, despite doing a little bit more work to support tiling:

This PR:

ParticleContainer::Redistribute_impl()          503      2.878       2.95      2.997  58.06%

Dev:

ParticleContainer::RedistributeGPU()            503      2.896      2.939      2.984  57.94%

@atmyers
Copy link
Copy Markdown
Member Author

atmyers commented Apr 21, 2026

I have added back in an assertion that tiling is off on the GPU if neighbor particles are used. I will remove this limitation and add a better neighbor particles test in a follow-up PR.

@atmyers atmyers merged commit d10a21a into AMReX-Codes:development Apr 22, 2026
72 checks passed
atmyers added a commit that referenced this pull request Apr 23, 2026
PR #5268 allowed particle tiling on GPU in Redistribute, but did not
extend this support to neighbor particles. This PR adds support for
this. It also extends to the existing test to support tiling and to more
carefully check that the ghosted particle data is exactly right. Note
that this test now reproduces all the particles on all ranks and
therefore isn't suitable for running on many ranks.

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants